Homology Detection via Family Pairwise Search
نویسنده
چکیده
The function of an unknown biological sequence can often be accurately inferred by identifying sequences homologous to the original sequence. Given a query set of known homologs, there exist at least three general classes of techniques for finding additional homologs: pairwise sequence comparisons, motif analysis, and hidden Markov modeling. Pairwise sequence comparisons are typically employed when only a single query sequence is known. Hidden Markov models (HMMs), on the other hand, are usually trained with sets of more than 100 sequences. Motif-based methods fall in between these two extremes. The current work introduces a straightforward generalization of pairwise sequence comparison algorithms to the case when multiple query sequences are available. This algorithm, called Family Pairwise Search (FPS), combines pairwise sequence comparison scores from each query sequence. A BLAST implementation of FPS is compared to representative examples of hidden Markov modeling (HMMER) and motif modeling (MEME). The three techniques are compared across a wide range of protein families, using query sets of varying sizes. BLAST FPS significantly outperforms motif-based and HMM methods. Furthermore, FPS is much more efficient than the training algorithms for statistical models.
منابع مشابه
Family pairwise search with embedded motif models
MOTIVATION Statistical models of protein families, such as position-specific scoring matrices, profiles and hidden Markov models, have been used effectively to find remote homologs when given a set of known protein family members. Unfortunately, training these models typically requires a relatively large set of training sequences. Recent work (Grundy, J. Comput. Biol., 5,<479-492, 1998) has sho...
متن کاملThe HHpred interactive server for protein homology detection and structure prediction
HHpred is a fast server for remote protein homology detection and structure prediction and is the first to implement pairwise comparison of profile hidden Markov models (HMMs). It allows to search a wide choice of databases, such as the PDB, SCOP, Pfam, SMART, COGs and CDD. It accepts a single query sequence or a multiple alignment as input. Within only a few minutes it returns the search resul...
متن کاملLimits of homology detection by pairwise sequence comparison
MOTIVATION Noise in database searches resulting from random sequence similarities increases as the databases expand rapidly. The noise problems are not a technical shortcoming of the database search programs, but a logical consequence of the idea of homology searches. The effect can be observed in simulation experiments. RESULTS We have investigated noise levels in pairwise alignment based da...
متن کاملA work stealing based approach for enabling scalable optimal sequence homology detection
Sequence homology detection is central to a number of bioinformatics applications including genome sequencing and protein family characterization. Given millions of sequences, the goal is to identify all pairs of sequences that are highly similar (or ‘‘homologous’’) on the basis of alignment criteria. While there are optimal alignment algorithms to compute pairwise homology, their deployment fo...
متن کاملA better gap penalty for pairwise SVM
SVM-Pairwise was a major breakthrough in remote homology detection techniques, significantly outperforming previous approaches. This approach has been extensively evaluated and cited by later works, and is frequently taken as a benchmark. No known work however, has examined the gap penalty model employed by SVM-Pairwise. In this paper, we study in depth the relevance and effectiveness of SVM-Pa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of computational biology : a journal of computational molecular cell biology
دوره 5 3 شماره
صفحات -
تاریخ انتشار 1998